Epidemic trend analysis of SARS‐CoV‐2 in South Asian Association for Regional Cooperation countries using modified susceptible‐infected‐recovered predictive model

Abstract A novel coronavirus causing the severe and fatal respiratory syndrome was identified in China, is now producing outbreaks in more than 200 countries around the world, and became pandemic by the time. In this article, a modified version of the well‐known mathematical epidemic model susceptible‐infected‐recovered (SIR) is used to analyze the epidemic's course of COVID‐19 in eight different countries of the South Asian Association for Regional Cooperation (SAARC). To achieve this goal, the parameters of the SIR model are identified by using publicly available data for the corresponding countries: Afghanistan, Bangladesh, Bhutan, India, the Maldives, Nepal, Pakistan, and Sri Lanka. Based on the prediction model, we estimated the epidemic trend of COVID‐19 outbreak in SAARC countries for 20, 90, and 180 days, respectively. A short‐mid‐long term prediction model has been designed to understand the early dynamics of the COVID‐19 epidemic in the southeast Asian region. The maximum and minimum basic reproduction numbers (R 0 = 1.33 and 1.07) for SAARC countries are predicted to be in Pakistan and Bhutan. We equate simulation results with real data in the SAARC countries on the COVID‐19 outbreak, and predicted different scenarios using the modified SIR prediction model. Our results should provide policymakers with a method for evaluating the impacts of possible interventions, including lockdown and social distancing, as well as testing and contact tracking.


INTRODUCTION
A novel coronavirus (SARS-CoV-2), called COVID-19, caused an outbreak in the city of Wuhan, Hubei Province, China that is linked to the Huanan Seafood Wholesale Market 1,2 in late December 2019. Till now, more than 200 countries around the world have been infected by a novel coronavirus. As of May 30, 2020, according to the World Health Organization (WHO), globally 5,817,385 confirmed cases have reported with a death count of 362,705. 3 Among them, the south-east Asia region has confirmed 4.2% of cases while the case fatality ratio is 2.86%. The very first COVID-19 patient was treated for being coronavirus positive in Wuhan, China, at the beginning of December 2019. In the south-east Asia region, the very first coronavirus infected patient was detected in Pakistan on February 26, 2020. 4 On January 30, 2020, WHO officially declared this outbreak of COVID-19 as global pandemic. 5 To mitigate the spread of COVID-19, the affected countries of the world have taken various measures, including citywide lockdown, social distancing, traffic halt, community management, and information on health education knowledge. More importantly, the outbreak of COVID-19 possessed a massive threat to global health and economics all over the world. One of the significant feature of novel coronavirus unlike other infectious diseases like severe acute respiratory syndrome (SARS), and Middle East respiratory syndrome (MERS), it causes asymptomatic infections (symptoms are very mild). [6][7][8] Responses to COVID-19 in South Asian Association for Regional Cooperation (SAARC) nations in 2020 were reported and examined with available data in a recent study by Malik et al. 9 For the months of April 2020 to December 2020, the authors utilized the exponential growth approach to calculate the reproductive number R t . Different epidemiological model parameters were also used to establish a correlation with the COVID-19 transmission pattern. According to this study, immunization and accurate advice on COVID-19 prevention measures may assist to reduce the risk of infection. Similar work has been proposed by Malek and Hoque 10 in the domain of COVID-19 epidemic trend analysis. The COVID-19 trend in South Asian countries was explored by the authors using the SEIATR model, an upgraded version of the SEIR system. However, their recommended results only anticipated the COVID cases until June 2020 and only for four nations including India, Pakistan, Bangladesh, and Afghanistan. In another study, Tiwari et al. 11 provided India-centric mathematical modeling that focused on the repercussions of lockdown in another prediction of COVID-19 disease spread. By studying nationwide lockdown in India, the authors used a SEIRD model with five compartments. Overall, the article argued that a global countrywide lockdown is an effective control policy for COVID-19 spread. For analyzing the COVID-19 spread across India, the authors in Reference 12 have presented a regression analysis model using the dataset available at Kaggle. However, the developed models is only capable of forecasting the next 6 days of COVID-19 spread data and among the six utilized model, sixth-degree polynomial regression outperformed others utilized models. In another study, the authors in Reference 13 used ML-guided piecewise linear regression to predict COVID-19 instances across India. The trend of COVID-19 instances climbed linearly, then exponentially, according to the findings. However, only the six separate states of India were considered for the next 50 days in the projection of cases. In order to estimate the spread of COVID-19 cases in India, the authors in Reference 14 suggested a deep LSTM architecture and attained an accuracy of 97.59% utilizing WHO data for India. Moreover, an extensive review of the available worldwide datasets for COVID-19 has been reviewed thoroughly. The authors in Reference 15 provide a time-dependent based susceptible-infected-recovered (SIR) model that is capable of detecting transmission and recovery rates using data provided by China. This research also proposed two distinct ways to social separation in order to minimize the effective reproduction number. This study only emphasized eight different world counters in five separate projection categories, including the United States, the United Kingdom, France, Iran, Spain, Italy, Germany, and the Republic of Korea. In these circumstances, the rate of transmission among a large number of people can increase within no time. According to the latest World Health Organization survey, only 87.9% of COVID-19 patients have a fever, and 67.7% have dry cough. 16 Therefore, this is highly crucial to estimate the intensity of the COVID-19 epidemic and predict the time course, peak time, total duration, and so on. In recent times, the authors in Reference 17 have investigated and projected COVID-19 instances using three distinct models (SIR, SEIQR, and ML), concentrating on three different nations (Australia, the United Kingdom, and the United States). The research found that the Prophet Algorithm fared better in the UK and the United States than in Australia among the ML models. However, to the best of our knowledge, no previous research has been conducted to predict COVID-19 instances focused on countries in the South Asian region. Therefore, our study focuses on the COVID-19 case prediction using a modified SIR (M-SIR) model in the countries of the SAARC. SAARC is considered as an intergovernmental organization and geopolitical union of states in South Asia. Its member states are Afghanistan, Bangladesh, Bhutan, India, the Maldives, Nepal, Pakistan, and Sri Lanka. Our aim is to develop a prediction model for the SAARC countries to understand the epidemiological trend of novel coronavirus outbreak in these countries. Here we explore a modified version based on the SIR epidemic model to predict the short term (20 days), mid-term (90 days), and long term (180 days) evaluation of COVID-19 situation in these countries of SAARC regions.

METHODOLOGY
The research methodology is divided into several phases. First, appropriate data were collected from four different data sources including Johns Hopkins University (JHU), COVID-19 Dataset available at Kaggle, COVID-19 Open Research Dataset, and population by country 2020 dataset. The data were then evaluated and preprocessed to eliminate any duplicate or missing values. This section also includes the working principle of M-SIR model, and then discussed the proposed algorithm of this research.

Dataset analysis
For this exploration, we have used datasets from various sources for our analysis and building the model. We have used four different sources of the dataset including, COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (January-May 2020), 18 COVID-19 Dataset (January-May 2020), 19 COVID-19 Open Research Dataset (CORD-19), 20 and population by country 2020 dataset. 21 Table 1 provides insight on each dataset and their respective data files with their column description. We also developed a custom dataset to develop and evaluate our model.

M-SIR model
Predictive mathematical disease models are important for understanding the trajectory of the outbreak and for preparing successful response strategies. One commonly used model is the human-to-human transmission SIR model, which defines people's flow through three mutually exclusive stages of infection: susceptible (S), infected (I), and recovered (R).
Most epidemic models are based on dividing the population into a small number of sections. Each person is identical in terms of their status with the considered disease. The SIR model based on three sections: susceptible (S) is the class for those who are susceptible to infection. This can include passive immune systems as soon as they lose their immunity. In the infected (I) class, the parasite level within the host is large enough, and there is a possibility of spreading the infection to other susceptible people. The recovered (R) class includes all infected and recovered individuals. This epidemiological model captures the dynamics of acute infections that, after recovery, confer lifelong immunity. In general, the overall size of the population is considered constant N = S + I + R. The two cases should be examined and characterized by the inclusion or exclusion of demographic factors. Let us assume in the SIR model; there is a natural host lifespan of 1/ years. Then the rate at which individuals in an epidemiological compartment suffer from natural mortality is given by .
It is important to emphasize that this factor is independent of the disease and should not reflect the pathogenicity of the infectious agent. Diachronically, it can be expressed in Equation (1).
So, the SIR model can be defined using Equation (2), Here, the initial conditions S(0) > 0, I(0) ≥ 0, and R(0) ≥ 0. It is important to enter the expression of basic reproduction number R 0 for this model. The basic reproduction number R 0 is the parameter that estimates whether a disease has spread to the population or not. If the estimated R 0 < 1, we can assume that the disease will die out, and if R 0 = 1, the disease remains in the system and is stable. But if R 0 > 1, the disease will spread and cause an outbreak. The higher the value of R 0 , the more difficult it is to control.

Proposed model and algorithm
In this exploration, we have used an SIR epidemic model. In a general SIR model, transmission rate ( ) and recovery rate ( ) are considered as two time-invariant variables. Moreover, several research studies have shown that a SIR model works much better in presenting the information contained in the confirmed case data than an SEIR model. 22 Therefore we have developed a model presented in Figure 1 that can dynamically adjust the crucial parameters while working on time-varying data, which is also treated as an M-SIR model. 23 However, in a basic SIR model, the reproduction number (R 0 ) is a simple division of transmission and recovery rates, as shown in Equation (3).
For building the model, we modified the primary reproduction number, R 0 with respect to time (t). Equation (4) represents the changes that happened depending on time (t).
We also considered detectable ( 1 ) and nondetectable ( 2 ) infected persons for building our model effectively, as shown in Equation (5).
In general, detectable persons contain a lower transmission rate than nondetectable persons. Therefore we calculated the transmission rate ( ) and recovery rate ( ) for each country of the SAARC region. For instance, Bangladesh has 165 million people, 24 and the first confirmed case reported in the country was on March 7, 2020. As of May 30, 2020, a total of 42,844 COVID-19 infected people were detected in the country. 25 Depending on the M-SIR model, Bangladesh contains a transmission rate of 0.63 with a recovery rate of 0.49. The initial reproduction number for the country is 1.27. Based on available data sources of every SAARC country, we have calculated these parameters for the prediction model, which is shown in Table 3. Algorithm 1 represents the working procedure of our proposed (M-SIR) prediction model.

Algorithm 1. M-SIR prediction model
The transmission rate and the recovery rate are two time-invariant variables in the conventional SIR model. denotes that each person has on average contacts with randomly selected others per unit of time. The recovery rate , on the other hand, suggests that diseased individuals recover or die at a constant average rate . As the time-varying feature of and is ignored in the traditional SIR model, therefore, this assumption is too simplistic to accurately and effectively anticipate the disease's progression. 15 For this reason, to make the model robust, time-dependent SIR (M-SIR) model is proposed, in which both the transmission rate and the recovery rate are functions of time t. Therefore, this type of M-SIR model is far more effective at predicting disease spread, control, and future trends.

RESULTS
We performed our simulations and tabulated the predicted results (short term, midterm, and long term) with all countries from the SAARC region. We also explored 3D parameter space using gradient descent to minimize the error. However, a lag parameter is also used in this experiment to reduce the gap of first confirmed cases of SAARC countries' pandemic situation. The M-SIR model was calculated using MATLAB R2020a software, and Tableau data analysis visualization tools were utilized to present the estimated data in a more dynamic manner. Table 3   understand the current (as of May 30, 2020) scenario of the countries (Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka) in terms of their confirmed and death cases ratio. Surprisingly the slow growth of infections in the South-Asian region could be a result of a lower number of testing and testing strategy. 26 Initially, the testing was limited to specific individuals who have come from high-risk countries. Even their immediate contacts were also ignored primarily. The initial growth rate of COVID-19 infection in the SAARC region is comparatively lower than countries like the US, France, Germany, Spain, China, Italy, and so on. However, based on the current scenario in the SAARC region (as of May 30, 2020), India confirmed the highest number of COVID-19 reported cases. A short term model prediction for the next 20 days (till June 19, 2020) is illustrated in Figure 3 for all the countries of SAARC regions. Similarly, we also predicted the epidemic curve of SAARC regions for the midterm (90 days) and long term (180 days) COVID-19 cases. After predicting the 20 days of the epidemic, we noticed that India, Bangladesh, Sri Lanka, and Pakistan are increasing gradually by minimizing the active cases, and their recovery rate is also rising over time. However, in this short period, the confirmed cases of these countries are also increasing till June 19, 2020. A comprehensive statistical analysis for the long-term prediction model has been developed for each country of the SAARC region. To highlight the results, this research considered the different descriptive statistical measures including mean, median, standard error, standard deviation, range, min value, max value, and different quartile values. Based on the prediction, Afghanistan, Bangladesh, India, and Pakistan are being considered the most severely affected region due to the adverse effect of COVID-19 (Table 4).

DISCUSSION
The US, Italy, Spain, Germany, China are the top most affected countries in the world due to the pandemic of COVID-19. Though the novel coronavirus originated and started to spread from China but, countries from the South-Asian region are also infected faster than other countries globally. The overall population density of the countries in the SAARC regions is also too high. According to Table 5, we can observe the various parameters (population, land area, density, and world share) and their impact on analyzing the spread of novel coronavirus in these regions.
We further extended our prediction model for the next 90 days and 180 days, respectively. Besides, the total number of the population of a country is also parameterized because the total population of a country cannot be infected. Therefore, if we consider the total population as infected, then the probable number of infected persons remains unknown. Based on the prediction model, we tabulated the model predicted results of all SAARC countries for short term, midterm, and long term prediction, respectively (Table 6).
A midterm (90 days) prediction model is further assessed until August 31, 2020, and the nature of the epidemic curve has been depicted in Figure 4. For all the countries of the SAARC regions, we have predicted the total number of confirmed, death, recovery, and active cases.
Lastly, a long term prediction model for the next 180 days (till November 30, 2020) situation has been presented in Figure 5. According to the prediction model, active cases will fall to zero at the end of June in Sri Lanka, and at the beginning of August in India ( Figure 5). However, Bangladesh, Maldives, and Pakistan will take more few months to TA B L E 5 Population, total area, and density of SAARC countries with their individual world share 21

Countries
Population Land area (km 2 ) Density (P/km 2 ) World share % • Both the confirmed and recovered cases will increase with a similar trend • Death cases will also fall to zero as active cases • Confirmed cases will notably increase.
• However, the death cases will jump to 2955 from zero in the long-term prediction. • In the mid-term and long-term forecast, active cases will become zero M-SIR prediction model, prediction statistics for the upcoming 90 days for all the countries in the SAARC regions has depicted here. Different predicted case scenarios (confirmed, death, recovered, and active) till August 31, 2020, help to understand the epidemiological nature of COVID-19 in the south Asian region. The predicted model indicates that in the next 90 days prediction curve will increase sharply for India (in terms of confirmed and death cases). Moreover, India will also show a substantial increase in its recovery rate. However, the number of active cases will increase in Pakistan till August 31, 2020.
reduce its active cases. On the other hand, countries like Afghanistan, Nepal, and Bhutan will show a steep increase in their active cases.
Since the classic forms of SIR are deterministic, an improved version based on parameter optimization is suggested to improve the prediction. Moreover, all the forecasts showed in the article for the countries of the SAARC region without considering the conditions of quarantine and social distancing. Apart from that, analysis of the data based on available COVD-19 cases till May 30, 2020. Therefore, the current data of the COVID-19 case trend is radically different from what this article predicted. With such challenges, the future direction of this research can be extended by employing other epidemiology models (i.e., SEIQR) along with some machine learning (i.e., regression analysis) architecture. Also, consideration of social distancing and quarantine along with other observations including policy actions, human behavior, and restrictions that have the potential to improve forecast accuracy are encouraged for future studies.

CONCLUSION
The COVID-19 epidemic has brought unprecedented health concerns for the community all around the globe. This research predicted the epidemic trend of COVID-19 in SAARC countries on the basis of short-term, midterm, and long-term situations. We explored the transmission pattern of a new coronavirus using the M-SIR compartmental epidemic model, a modified version of the SIR pandemic model. This study also calculated the transmission rate, recovery rate, and reproduction number for eight South Asian countries. In addition, nine different statistical metrics have been analyzed, and it has been determined that Afghanistan, Bangladesh, India, and Pakistan will continue to be the most affected countries in the SAARC area through November 2020. To the best of our knowledge, this study is the very first COVID-19 prediction model which focused on the countries of SAARC regions. This epidemic modeling can be a helpful tool for estimating and predicting the scale and time course of COVID-19, evaluation of the effectiveness of public health interventions, and information on public health policies in SAARC countries. In the future machine learning tools can be further used to identify and optimize the time profile for the confinement.

CONFLICT OF INTEREST
The authors declare that there are no conflicts of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author, upon reasonable request.